ECEN 4632: Intro to Digital Filtering
Professor Francois Meyer
Final Project
Jesse Tao and Eric Daugherty
May 3rd, 2018
Part 1)
Below are the figures generated using the function with the Hann window. The code used to
generate the plots is found in the appendix at the end of the report. Comments on the results of
each track for the stft are given above each figure. Comments for the rest of the figures in the
report will also use the Hann window. Stft takes in the sampled signal as well as which window
will be used. The Hann window in the case of this report, but the Kaiser and Blackman window
could be used as well by changing the second argument of the function. The function produces a
time-frequency matrix where each column of the matrix is the frequency response of 1024
sample section of a song, and the sections overlap by 768 samples each. All songs are analyzed
individually in the following section.
For track201-classical.wav we can see that there are many different notes being played which
makes sense as this is piece is better known as Violin Concerto No. 1 in A minor by Bach. This
makes sense for a concerto, as the definition of a concerto is a musical piece where there is a
main instrument, which in this case is a violin, accompanied by an orchestra which can include
many different instruments. It can be seen that the main spots of red at lower frequencies is likely
the steady beat and harmony that can be heard from the cello, while the higher frequencies are
the melodies of the first set of violins and the overtones of the second set of violins. It is also
important to note that while it is not as noticeable to the ear, the chart shows as slight decrease in
frequency as time goes on which can likely be attributed to the nature of string instruments as the
vibrations caused between the string and the bow can cause the string to loosen, thus generating
lower frequencies than intended.
In track204-classical.wav, the melody can be clearly seen as there is only one instrument playing
which is a piano. As this is just a standard etude (practice exercise) the track follows a very
standard pattern of increasing frequencies and then decreasing frequencies. We can see some
spikes in frequency as the etude isn’t completely boring and does scale up to some harmonics of
the same note. The main takeaway is that it is easy figure out the melody of a song when there is
only one instrument and the structure is very regimented as this etude is designed for a pianist to
practice their arpeggios (set of notes played in an increasing frequency and then decreasing
frequency corresponding to the intended melody for a song).
For track370-electronic.wav, it is just a random jumble of electronic sounds put together and
from the chart we can see that as there is not really set structure, and there just seems to be noise
at a wide variety of frequencies. It is unclear what the user should be listening to as there seems
to be what sounds like drums tapping in quick succession, but it is generally overshadowed by
the very electronic and unnatural sounds which vary a lot in frequency even just for less than a
second of time.
Track396-electronic.wav has a very strong bass in the background which can be seen by all the
red at low frequencies, it does however cut out around 120 seconds which does make sense as
there is only the woman singing at that point in the song. As for the light red at higher
frequencies, this can either be attributed to the woman’s voice of the other electronic
instrumentation heard throughout the song that generates the melody. There is very quick
switching between notes for the melody which is the primary reason that there seems to be a lot
of noise for short periods of time.
In this jazz track we can hear something that sounds like a synthesizer and steady drums and
symbols in the background. There is somewhat of a structure to the synthesizer which looks to
repeat every 15 seconds, which makes sense as the musician uses the same series of 5 notes very
often throughout the track. Near the end the structure seems to be loss which can be heard in the
song as the musician decides to show off their skills and abandon the structure that was set at the
beginning of the track.
For this jazz track, again the synthesizer seems to hold a steady pattern, but it is most difficult to
see in this chart as the song is higher tempo (more fast-paced) as the drums are more high
energy. Between 90 and 120 seconds the track seems to lose it’s rhythm which can be heard as
the synthesizer start to vary it’s notes quite a bit and the drums become very soft. But the track
pick right back up to the original rhythm right after.
For this metal track, there is a clear bass throughout the song, and a vocalist singing the melody.
Looking at the Fourier transform, no real structure cannot be seen for this track, which is
common in metal music and there is a lot of noise, but the rhythm is difficult to figure out as the
bass keeps constantly changing the tempo of the song pretty sporadically. The song does seem to
cover a wide range of frequencies as the majority of the plot is orange and red signifying a wide
range of harmonics.
This metal track seems to have even more noise, and even though the bass seems to have a bit
more structure than the previous track, but the structure does not retain after 30 seconds in the
song. The plot is even more red than the previous range, which means every harmonic is equally
strong which is quite impressive from this musician. From 60 to 90 seconds we can see some
ripples, which could be attributed to the slides that can be heard in this area where it seems like
an electric guitar is just sliding their finger quickly down the frets (lines on guitar to determine
what note is played) to produce this unique sound.
In this rock track there seems to be a standard 4 notes that are played throughout the song which
gives a structure in the lower frequencies that can kind of be seen, but since the 4 notes seem to
happen at random intervals they are a bit difficult to see in the plot. At higher frequencies, we
can see the woman’s voice as well as the harmonics that are produced with the other instruments
in the background.
In this rock track, the structure is clear to see with the ukulele like instrument in the background
as it plays a very melodic line that is easy to follow, the structure is more difficult to follow
around 45 to 75 seconds and 110 and 155 seconds due to the woman’s voice and the additional
harmonics which shows in the plot, but it is still pretty clear and there is just more noise which
makes sense with the woman’s voice and ukulele going at the same time.
For this world track, the plot shows there is not much at the beginning which makes sense as it is
just the running stream of water in the background. As the track goes on a flute is introduced
which produces a steady tone, which can be seen by the dark red sections as low frequencies
throughout the song. There is not much else in the song which the plot represents so we can
safely say this plot is correct.
.
For this world track we can see a steady structure in the plot, which can also be heard in the
song.There are also clear harmonics which makes sense as it is just a couple instruments in the
background. The song mostly follows the same structure throughout with some changes in
frequency in the plot. Even though the sounds produces in the track might not be the most
appealing to listen to, one can still hear a clear melody being played which is reflected in the
plot.
Part 2)
Below are the plots generated by pspec. The code used to generate the plots is found in the
appendix at the end of the report. Pspec takes in the sampled signal which is to be plotted. To
change which window is used the second argument of stft inside the pspec function can be
changed. The structure of these plots is very similar to those produced by stft. By squaring each
data point the large values become larger and the small values become smaller making it easier
to see where the salient features of each song are. All songs are analyzed individually in the
following section and comments for the figures generated by pspec are included above each
figure.
Parts 3 and 4)
Below are the plots for peakpwr. The code used to generate the plots is found in the appendix at
the end of the report. Peakpwr takes in the sampled signal being plotted. Only the cell with the
maximum value is preserved in each column, this is the most prominent frequency in this section
of 1024 samples. We only look at frequencies under 70 Hz in our plots as most of the maximums
are between the range of 0-70 Hz. All songs are analyzed individually in the following section
and comments for the figures generated by peakpwr are included above each figure.
Part 5)
Part 6)
Score Matrix
Bakdhsasdfhk all graphs comments are heere
Inshert hella comments
Appendix Parts 1-4
This is the main function used to generate all plots for the report as well as documentation for
how to generate these plots using the different windows: Blackman and Kaiser.
Main script parts 1-4
close all;
clear all;
clc;
filename = dir('*.wav');
numofSongs = length(filename);
for i = 1:numofSongs
%for j = 1:3
%use loop above to run through each window: 1) Hann 2) Blackman 3) Kaiser
%Part 1
[W, size_x] = stft(filename(i).name, 1);
W_abs = 10*log10(abs(W));
%Part 2
P = pspec(filename(i).name);
P_abs = 10*log10(abs(P));
%Part 3
PP = peakpwr(filename(i).name);
PP_abs = 10*log10(abs(PP));
%colormap jet
%imagesc(PP_abs)
%caxis ([-40 20])
%colorbar
%P = pspec(filename);
%PP = peakpwr(filename);
%plot and save all imagesc needed for report
h = figure;
W_abs = 10*log10(abs(W));
xticklabels = 0:30:round(size_x);
xticks = linspace(1, size(W_abs, 2), numel(xticklabels));
colormap jet
imagesc(W_abs)
caxis ([-40 20])
colorbar
set(gca, 'YDir', 'normal', 'XTick', xticks, 'XTickLabel',xticklabels)
xlabel('Time (seconds)');
ylabel('Frequency (Hz)');
title({'Fourier Transform of track: ', filename(i).name});
saveas(gca,['Abs(y)', filename(i).name , '.png']);
close(h);
h = figure;
xticklabels = 0:30:round(size_x);
xticks = linspace(1, size(W_abs, 2), numel(xticklabels));
colormap jet
imagesc(P_abs)
caxis ([-40 20])
colorbar
set(gca, 'YDir', 'normal', 'XTick', xticks, 'XTickLabel',xticklabels)
xlabel('Time (seconds)');
ylabel('Frequency (Hz)');
title({'Power Spec of track: ', filename(i).name});
saveas(gca,['P(y)', filename(i).name , '.png']);
close(h);
h = figure;
xticklabels = 0:30:round(size_x);
xticks = linspace(1, size(W_abs, 2), numel(xticklabels));
colormap jet
imagesc(PP_abs)
caxis ([-40 -20])
ylim([0 50])
colorbar
set(gca, 'YDir', 'normal', 'XTick', xticks, 'XTickLabel',xticklabels)
xlabel('Time (seconds)');
ylabel('Frequency (Hz)');
title({'Peak Power of track: ', filename(i).name});
saveas(gca,['PP(y)', filename(i).name , '.png']);
close(h);
end
stft.m
function [W, size_x] = stft(x,window)
%stft takes the fourier transform of an audio signal x
%and returns a 'time-frequency' matrix W where W is the Fourier
%transform of the windowed frame xn
x = audioread(x); % gets the samples of the file 'x'
N = 1024;
nframes = floor(2*length(x)/N);
xn = buffer(x,N, N/4); %creates overlapping samples of size N/4
W = zeros(N/2 + 1, length(xn));
size_x = length(x)/11025; %used to express samples in seconds for plots
switch window
case 1
w=hann(N); % Hann (Hanning) window
case 2
w=blackman(N); % Blackman window
case 3
w=kaiser(N, 0.5); % Kaiser window
otherwise
w=kaiser(N, 0.5); % defaults to Kaiser window
end
for i = 1:length(xn)
Y(:,i) = fft(xn(:,i).*w); %compute the fast Fourier transform
end
K = N/2 + 1;
for i = 1:length(xn)
W(1:K,i) = Y(1:K,i); %generate the 'time-frequency matrix W
end
end
pspec.m
function [P] = pspec(x)
%function pspec computes the magnitude squared of the
%Fourier transform of frame n
i = stft(x,1);
P = abs(i);
P = P.^2;
end
peakpwr.m
function [PP] = peakpwr(x)
%peakpwr finds the maximum of pspec of frame n
x = pspec(x);
PP = zeros(size(x));
for i = 1:length(x)
[values,locations] = max(x(:,i));
PP(locations,i) = values;
end
Appendix Parts 5-6
Main script parts 5-6
close all;
clear all;
clc;
%generate file names
filename = dir('*.wav');
numofSongs = length(filename);
%for loop goes through all the tracks
for i = 1:10
[x,y] = audioread(filename(i).name); %read track
[iFz,frameSize, nFrames] = ifz_3(x',y,2048); %take the IF
[W,size_t] = stft_2(x,fs,4); %takes stft
W_abs = 10*log10(abs(W));
C(:,:,i) = chromatic(x',y, 1024); %takes chroma/NPCP
c_sum(:,i) = sum(C(:,:,i), 2);
%generate plots for IF
h = figure;
xticklabels = 0:60:ceil(length(x)/y);
yticklabels = 0:500:ceil(y/2);
xticks = linspace(1, nFrames, numel(xticklabels));
yticks = linspace(1, frameSize, numel(yticklabels));
axis xy;
colormap jet
imagesc(iFz)
caxis ([1200 3200])
c = colorbar;
c.TicksMode = 'manual';
c.Ticks = [-2000 0 2000 4000 6000 8000];
c.TickLabelsMode = 'manual';
c.TickLabels = {'-2000', '0', '2000', '4000', '6000', '8000'};
set(gca, 'YDir', 'normal');
xlabel('Time (seconds)');
ylabel('Frequency (Hz)');
title({'Instantaneous frequency of track: ' filename(i).name});
saveas(gca,['Chroma(y)', filename(i).name , '.png']);
close(h);
%generate chroma plots
h=figure;
C_abs = 10*log10(abs(C));
imagesc(flipud(C_abs));
ax=gca;
ax.YTickLabel=fliplr({'A ','A#','B ','C ','C#','D ','D#','E ',...
'F ','F#','G ','G#'});
ax.YTick=linspace(1,12,12);
colormap 'jet';
title({'Chroma :'; filename(i).name});
xlabel('frames');
ylabel('Note');
colorbar;
caxis([-60 0])
saveas(gca,['NPCP' filename(i).name '.png']);
close(h);
end
% [x1,fs] = audioread('1.wav');
% x2 = audioread('3.wav');
%
% [C1] = chromatic(x1',fs,1024);
% [C2] = chromatic(x2',fs,1024);
%
% c1 = sum(C1,2);
% c2 = sum(C2,2);
%
% CC = corr(C1',C1');
% cc = diag(CC);
% score = sum(abs(cc));
% C_1 = C(:,:,1);
% C_2 = C(:,:,2);
%generate correlation matrix for the 10 cover songs
for i = 1:10
for j = 1:10
%shift code in case 1 track is transposed from another track
c1 = sum(C(:,:,i), 2);
c2 = sum(C(:,:,j), 2);
shift = zeros(3,1);
shift(1) = corr(c1(2:12), c2(1:11));
shift(2) = corr(c1(1:11), c2(1:11));
shift(3) = corr(c1(1:11), c2(2:12));
[dummy, ishift] = max(shift);
switch ishift
case 1
D(1:11,:,i) = C(2:12,:,i); D(12,:,i) = C(1,:,i); C(:,:,i) =
D(:,:,i);
case 3
D(1:11,:,j) = C (2:12,:,j); D(12,:,j) = C(1,:,j); C(:,:,j) =
D(:,:,j);
end
CC = corr((C(:,:,i)'), (C(:,:,j)')); %calulate correlation
cc = diag(CC); %diag to remove uneccesary functions
score(i,j) = sum(abs(cc));
score(i,i) = 1; %produce score
imagesc(score);
end
End
ifz.m
function[iFz, frameSize,nFrames] = ifz_3(signal,samplingRate,lengthFFT)
%function ifz_3 calculates the instantaneous frequency of a vector of
%signal from a song, samplingRate is the sampling rate of the song,
%lengthFFT is the number of frames used to caclculate the instantaneous
%frequency
frameSize = lengthFFT/2; % overlap by N/2 frames
frameSize2 = lengthFFT/4;
s = length(signal);
nFrames = 1 + floor((s - (frameSize))/(frameSize2)); %number of frames to
anaylze
T = (frameSize)/samplingRate;
win = 0.5*(1-cos([0:((frameSize)-1)]/(frameSize)*2*pi)); %raised cosing window
dwin = -pi / T * sin([0:((frameSize)-1)]/(frameSize)*2*pi);
iFz = zeros(frameSize,1);
iFz = zeros(frameSize, nFrames);
S = zeros(frameSize, nFrames);
nmw1 = floor(frameSize2);
nmw2 = frameSize - nmw1;
omega = 2*pi*[0:(lengthFFT-1)]*samplingRate/lengthFFT; %base frequency
%for loop to calculate instantaneous frquency
for ifrm = 1:nFrames
u = signal((ifrm-1)*(frameSize2) + [1:(frameSize)]);
wu = win.*u; du = dwin.*u;
wu = [zeros(1,nmw1),wu,zeros(1,nmw2)];
du = [zeros(1,nmw1),du,zeros(1,nmw2)];
DU = fft(fftshift(du));
WU = fft(fftshift(wu));
domega = imag(DU .* conj(WU)) ./(abs(WU).*abs(WU) + (abs(WU)==0));
ifz = (omega + domega)./(2*pi);
iFz (:,ifrm) = ifz(1:frameSize)';
%S (:,ifrm) = WU(1:frameSize)'*norm;
end
end
chromatic.m (cover.m)
function [C] = chromatic (signal,samplingRate,lengthFFT)
% Computes the 12 x nframes chroma representation
% The computation procedes in two steps:
% 1) an estimate of the instantaneous frequency is computed using the
derivative of the
% phase.
% 2) the energy associated with the instantaneous frequency is folded back
into the octave
% defined by A0.
%
% INPUT:
% signal: a row vector of size 1 x nsamples
% samplingRate: the sampling rate in Hz
% lengthFFT: the length of each FFT analysis, this is N in the
project guidebook
%
% OUTPUT:
% C: matrix of chroma of size 12 x (2*(nsamples/lengthFFT -1)) if
the overlap between
% the windows is lengthFFT/2.
if (size (signal,1) > size(signal,2))
error (['Error in chromatic: \n'...
'\tThe input variable %s is currently a column vector. \n\tIt
should be a row vector.\n'...
'\tPlease consider transposing %s.'],inputname(1), inputname(1));
end
A0 = 27.5;
nchr = 12; % number of semi tones
% parameters of the Fourier analysis
frameSize = lengthFFT/2;
frameSize2 = lengthFFT/4;
% get the number of frames
s = length(signal);
nFrames = 1 + floor((s - (frameSize))/(frameSize2));
T = (frameSize)/samplingRate;
% build the window for the analysis
win = 0.5*(1-cos([0:((frameSize)-1)]/(frameSize)*2*pi));
dwin = -pi / T * sin([0:((frameSize)-1)]/(frameSize)*2*pi);
% used later for normalizing purposes
norm = 2/sum(win);
% memory allocation
iFz = zeros(frameSize,1);
iFz = zeros(frameSize, nFrames);
S = zeros(frameSize, nFrames);
nmw1 = floor(frameSize2);
nmw2 = frameSize - nmw1;
omega = 2*pi*[0:(lengthFFT-1)]*samplingRate/lengthFFT;
% computation of the instantaneous frequency, using the analytical expression
of the derivative of the phase
for ifrm = 1:nFrames
u = signal((ifrm-1)*(frameSize2) + [1:(frameSize)]);
wu = win.*u; du = dwin.*u;
wu = [zeros(1,nmw1),wu,zeros(1,nmw2)];
du = [zeros(1,nmw1),du,zeros(1,nmw2)];
DU = fft(fftshift(du));
WU = fft(fftshift(wu));
domega = imag(DU .* conj(WU)) ./(abs(WU).*abs(WU) + (abs(WU)==0));
ifz = (omega + domega)./(2*pi);
iFz (:,ifrm) = ifz(1:frameSize)';
S (:,ifrm) = WU(1:frameSize)'*norm;
end;
% now we find the local maxima of the instantaneous frequency
f_ctr = 1000;
f_sd = 1;
f_ctr_log = log2(f_ctr/A0);
fminl = oct2hz(hz2oct(f_ctr)-2*f_sd);
fminu = oct2hz(hz2oct(f_ctr)-f_sd);
fmaxl = oct2hz(hz2oct(f_ctr)+f_sd);
fmaxu = oct2hz(hz2oct(f_ctr)+2*f_sd);
minbin = round(fminl * (lengthFFT/samplingRate) );
maxbin = round(fmaxu * (lengthFFT/samplingRate) );
% compute the first order derivative along the frequency direction to detect
plateaux
ddif = [iFz(2:maxbin, :);iFz(maxbin,:)] - [iFz(1,:);iFz(1:(maxbin-1),:)];
% detect the frequencies around which the IF is changing very slowly, if at
all
dgood = abs(ddif) < .75*samplingRate/lengthFFT;
dgood = dgood .* ([dgood(2:maxbin,:);dgood(maxbin,:)] > 0 |
[dgood(1,:);dgood(1:(maxbin-1),:)] > 0);
p = zeros(size(dgood));
m = zeros(size(dgood));
% try to fit a second order polynomial locally
for t = 1:size(iFz,2)
ds = dgood(:,t)';
lds = length(ds);
st = find(([0,ds(1:(lds-1))]==0) & (ds > 0));
en = find((ds > 0) & ([ds(2:lds),0] == 0));
npks = length(st);
frqs = zeros(1,npks);
mags = zeros(1,npks);
for i = 1:length(st)
bump = abs(S(st(i):en(i),t));
frqs(i) = (bump'*iFz(st(i):en(i),t))/(sum(bump)+(sum(bump)==0));
mags(i) = sum(bump);
if frqs(i) > fmaxu
mags(i) = 0;
frqs(i) = 0;
elseif frqs(i) > fmaxl
mags(i) = mags(i) * max(0, (fmaxu - frqs(i))/(fmaxu-fmaxl));
end
if frqs(i) < fminl
mags(i) = 0;
frqs(i) = 0;
elseif frqs(i) < fminu
mags(i) = mags(i) * (frqs(i) - fminl)/(fminu-fminl);
end
if frqs(i) < 0
mags(i) = 0;
frqs(i) = 0;
end
end
bin = round((st+en)/2);
p(bin,t) = frqs;
m(bin,t) = mags;
end
ncols = size (p,2);
Pocts = hz2oct(p+(p==0));
Pocts(p(:)==0) = 0;
nzp = find(p(:)>0);
[hn,hx] = hist(nchr*Pocts(nzp)-round(nchr*Pocts(nzp)),100);
centsoff = hx(find(hn == max(hn)));
Pocts(nzp) = Pocts(nzp) - centsoff(1)/nchr;
PoctsQ = Pocts;
PoctsQ(nzp) = round(nchr*Pocts(nzp))/nchr;
Pmapc = round(nchr*(PoctsQ - floor(PoctsQ)));
Pmapc(p(:) == 0) = -1;
Pmapc(Pmapc(:) == nchr) = 0;
C = zeros(nchr,ncols);
for t = 1:ncols;
C(:,t)=(repmat([0:(nchr-1)]',1,size(Pmapc,1))==repmat(Pmapc(:,t)',nchr,1))*m(:
,t);
end
return;
function [oct] = hz2oct(freq, A440)
if nargin < 2;
A440 = 440;
end
oct = log(freq./(A440/16))./log(2);
return;
function [hz] = oct2hz(octs,A440)
if nargin < 2;
A440 = 440;
end
hz = (A440/16).*(2.^octs);
return;